November 20th, 2017

Overview

  1. Problem Overview
  2. What Data is Used?
  3. Regression Analysis
  4. Spatial Analysis

Problem Overview

Question 1

Does the proximity of opening a new station effect the ridership of other stations near the newly opened station?

  • Not able to turn this into a 'spatial data' problem, seems like a distance problem

Question 2

How does the proximity of NYC subway stations effect ridership?

  • Bike Share guide recommends placing bike stations near public transit. NACTO

Data Sources

Citibike Data

Source

Ridership Data

Observations: 9,884,307
Variables: 11
$ tripduration     <dbl> 997, 1904, 305, 250, 464, 1118, 394, 1449, 42...
$ starttime        <dttm> 2013-07-01 06:00:16, 2013-07-01 06:00:30, 20...
$ stoptime         <dttm> 2013-07-01 06:16:53, 2013-07-01 06:32:14, 20...
$ startstationid   <chr> "436", "294", "385", "271", "477", "488", "30...
$ startstationname <chr> "Hancock St & Bedford Ave", "Washington Squar...
$ endstationid     <chr> "467", "375", "440", "390", "522", "497", "32...
$ endstationname   <chr> "Dean St & 4 Ave", "Mercer St & Bleecker St",...
$ bikeid           <dbl> 16199, 20281, 18143, 16370, 15497, 15502, 161...
$ usertype         <chr> "Subscriber", "Subscriber", "Subscriber", "Su...
$ birthyear        <dbl> 1979, 1949, 1988, 1962, 1975, 1957, 1963, 195...
$ gender           <chr> "2", "1", "1", "1", "1", "1", "2", "1", "1", ...

Citibike Data

Summary Statistics

Time: 6 - 10 am weekday mornings

Date Range:

[1] "2013-07-01 UTC" "2017-07-31 UTC"

Stations:

[1] 749

Citibike Data

Exploration

Citibike Data

Exploration (Cont.)

Citibike Data

  • Only the stations labeled in red will be used in the analysis.

  • These stations have the largest amount of longitudinal data and are more homogenous in the surrounding area.

  • Start analysis in 2014, giving 6 months "burn in"

Historic Weather Data

NOAA Data Request

Observations: 1,673
Variables: 6
$ day_trip <dttm> 2013-01-01, 2013-01-02, 2013-01-03, 2013-01-04, 2013...
$ PRCP     <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,...
$ SNOW     <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
$ SNWD     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ TMIN     <int> 26, 22, 24, 30, 32, 34, 37, 35, 39, 40, 37, 42, 43, 3...
$ TMAX     <int> 40, 33, 32, 37, 42, 46, 45, 48, 49, 47, 46, 47, 50, 5...

Subway Entrance Data

Regression Analysis

Model Definition

  • Fit a hierarchical Poisson model to control for variability throughout the year.
  • Outcome - Number of Rides

Coefficients:

  • Fixed: Temperature, Snow, Days from Jan 1, 2014, previous Days count
  • Random: Station, Week of Year

Model Performance

Specific Group of Stations

Spatial Analysis

Models

Does proximity to subway stations show decrease in Citibike usage?

Fitting the Variogram

No Regressors

Closest Subway Features